<!DOCTYPE html>
<!--
File:      user.html
Author:    Henry Feild
Date:      March 2013
Purpose:   Describes the CLRM user API.

%%COPYRIGHT%%

Version: %%VERSION%%
-->
<html>
<head>
    <script src="js/external/jquery.min.js"></script>
    <script src="js/content-page.js"></script>
</head>
<body>
<div class="toc">
    <h2>Contents</h2>
    <ul>
        <!-- <li><a href="#user">User API (top)</a> -->
        <li><a href="#user:events">Interaction events</a>
        <li><a href="#user:history">Accessing User History</a>
        <ul>
                <li><a href="#user:read-all">Reading all interactions</a>
                <li><a href="#user:read-searches">Reading searches</a>
                <li><a href="#user:read-clicks">Reading clicks</a>
                <li><a href="#user:read-pairs">Reading search-click pairs</a>
        </ul>
        <li><a href="#user:realtime">Accessing Real-Time User Activity</a>
        <li><a href="#user:save">Saving Data</a>
    </ul>
</div>



<h1>User API</h1>

<p>
The User API is available via the <code>api.user</code> namespace. It has 
several sub namespaces:
</p>
<ul>
    <li><code>api.user.history</code> for accessing a user's search interactions
        up until the current moment.
    <li><code>api.user.realTime</code> to listen for new 
        interactions as they happen.
    <li><code>api.user.storage</code> for saving state.
</ul>

<a name="events"></a>
<h2>Interaction events</h2>

<p>
CrowdLogger tracks several user-browser interactions: clicks, queries, tab events, page loads, and page focuses. When any of these events are detected, an object is
created describing the event. This is then broadcasted and logged.
Here's a description of the object fields for each kind of event.
All events have at minimum two fields: <code>id</code> (a unique id used by
the database) and <code>e</code> (for "event", see table below). Currently they
all have a <code>t</code> (timestamp) field, as well, but we may introduce
some new events that don't include a timestamp.
</p>



<ul>
    <li><a href="#user:search">Search</a>
    <li><a href="#user:click">Click</a>
    <li><a href="#user:tabadd">Tab Add</a>
    <li><a href="#user:tabselect">Tab Select</a>
    <li><a href="#user:tabremove">Tab Remove</a>
    <li><a href="#user:pageload">Page Load</a>
    <li><a href="#user:pagefocus">Page Focus</a>
    <li><a href="#user:pageblur">Page Blur</a>
    <li><a href="#user:loggingstatuschange">Logging Status Change</a>
</ul>

<!-- SEARCH -->
<a name="search"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Search</h3>
    A query submitted to a search engine. Currently, only a few search engines
    are supported (Google, Yahoo!, and Bing).

    <h4>Fields</h4>
    <ul>
        <li>{string}    <code>e: 'search'</code>.
        <li>{int}       <code>id</code>: A unique identifier.
        <li>{int}       <code>t</code>: The timestamp.
        <li>{int}       <code>tid</code>: The id of the tab in which the search
            occurred.
        <li>{string}    <code>q</code>: The query.
        <li>{string}    <code>se</code>: The search engine.
        <li>{string}    <code>url</code>: The URL of the results page.
        <li>{int}       <code>srnk</code>: The starting rank of the results.
        <li>{string}    <code>rcnt</code>: The result count as displayed.
        <li>{array of objects} <code>res</code>: A list of the search results.
            Each object in the array represents a single result and has the 
            following structure:
            <ul>
                <li>{int} <code>rnk</code>: The rank.
                <li>{string} <code>ttltxt</code>: The title text.
                <li>{string} <code>ttlurl</code>: The title URL.
                <li>{string} <code>durl</code>: The displayed URL.
                <li>{string} <code>shtml</code>: The summary HTML.
                <li>{array of objects} <code>l</code>: An array of links from
                    the result (including the title):
                    <ul>
                        <li>{string} <code>anc</code>: The anchor text.
                        <li>{string} <code>url</code>: The URL.
                    </ul>
            </ul>

    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>
 e: "search", 
 t: 1363369838589, 
 tid: "1363369830328", 
 q: "html symbols", 
 se: "http://www.bing.com/search", 
 url: "http://www.bing.com/search?q=html+symbols&amp;form=MOZSBR&amp;pc=MOZI",    
 rcnt: "280,000,000 results", 
 srnk: 1, 
 id: 929,
 res: [{ 
     rnk: 1, 
     ttltxt: "&lt;strong&gt;HTML&lt;/strong&gt; &lt;strong&gt;Symbol&lt;/strong&gt; Entities Reference", 
     ttlurl: "http://www.w3schools.com/tags/ref_symbols.asp", 
     durl: "www.w3schools.com/tags/ref_&lt;strong&gt;symbols&lt;/strong&gt;.asp", 
     shtml: "&lt;strong&gt;HTML&lt;/strong&gt; &lt;strong&gt;Symbol&lt;/strong&gt; Entities. This entity reference includes mathematical &lt;strong&gt;symbols&lt;/strong&gt;, Greek &lt;strong&gt;characters&lt;/strong&gt;, various arrows, technical &lt;strong&gt;symbols&lt;/strong&gt; and shapes. Note: Entity names are ...", 
     l: [ { 
        anc: "&lt;strong&gt;HTML&lt;/strong&gt; &lt;strong&gt;Symbol&lt;/strong&gt; Entities Reference", 
        url: "http://www.w3schools.com/tags/ref_symbols.asp" 
     }]
    },
    ...
  ]
}
</pre>
</div>

<!-- CLICK -->
<a name="click"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Click</h3>
    Clicks on URLs, whether search engine result URLs or otherwise.

    <h4>Fields</h4>
    <ul>
        <li>{string}    <code>e: 'click'</code>
        <li>{int}       <code>id</code>: A unique identifier.
        <li>{int}       <code>t</code>: The timestamp.
        <li>{string}    <code>turl</code>: The clicked URL.
        <li>{string}    <code>surl</code>: The URL of the page on which the 
            click occurred.
        <li>{string}    <code>stid</code>: The id of the tab in which the click
            occurred.
        <li>{string}    <code>anc</code>: The anchor text of the link.
        <li>{object}    <code>s</code>: Only present if this was a click from
            a search page. Contains the following fields:
            <ul>
                <li>{string}    <code>q</code>: The associated query.
                <li>{string}    <code>r</code>:  The rank (will be 
                    <code>null</code> if the click was not on a result link 
                    (e.g., on an ad)).
            </ul>
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'click', 
 t: 1360168716158, 
 turl: 'http://en.wikipedia.org/wiki/Panda', 
 surl: 'https://www.google.com/search',
 s: {
    q: 'panda',
    r: '1'
 }
}
</pre>
</div>

<!-- TAB ADDED -->
<a name="tabadd"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Tab Add</h3>
    Fired any time a tab (or new window) is opened.

    <h4>Fields</h4>
    <ul>
        <li>{string} <code>e: 'tabadd'</code>
        <li>{int} <code>id</code>: A unique identifier.
        <li>{int} <code>t</code>: The timestamp.
        <li>{string} <code>ttid</code>: The target tab id (the one being 
            added).
        <li>{string} <code>turl</code>: The URL loaded in the new tab.
        <li>{string} <code>stid</code>: The source tab id (the one from 
            which the new one was added).
        <li>{string} <code>surl</code>: The URL loaded in the source tab.
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'tabadd', 
 t: 1360168716158, 
 ttid: '333445',
 turl: 'http://www.cnn.com', 
 stid: '333444',
 surl: 'http://en.wikipedia.org/wiki/Panda'}
</pre>
</div>

<!-- TAB SELECTED -->
<a name="tabselect"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Tab Select</h3>
    Fired any time a tab is selected. This is not incredibly reliable, e.g.,
    it is not fired when switching to a tab from another window.

    <h4>Fields</h4>
    <ul>
        <li>{string} <code>e: 'tabselect'</code>
        <li>{int} <code>id</code>: A unique identifier.
        <li>{int} <code>t</code>: The timestamp.
        <li>{string} <code>tid</code>: The id of the selected tab.
        <li>{string} <code>url</code>: The URL of the page loaded in the 
            selected tab.
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'tabremove', 
 t: 1360168716158, 
 tid: '333444',
 url: 'http://en.wikipedia.org/wiki/Panda'}
</pre>
</div>

<!-- TAB REMOVED -->
<a name="tabremove"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Tab Remove</h3>
    Fired any time a tab (or window) is closed.

    <h4>Fields</h4>
    <ul>
        <li>{string} <code>e: 'tabremove'</code>
        <li>{int} <code>id</code>: A unique identifier.
        <li>{int} <code>t</code>: The timestamp.
        <li>{string} <code>tid</code>: The id of the removed tab.
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'tabremove', 
 t: 1360168716158, 
 tid: '333444'}
</pre>
</div>


<!-- PAGE LOAD -->
<a name="pageload"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Page Load</h3>
    Fires any time a page starts to load (it does not actually mean the page
    has finished loading).

    <h4>Fields</h4>
    <ul>
        <li>{string} <code>e: 'pageload'</code>
        <li>{int} <code>id</code>: A unique identifier.
        <li>{int} <code>t</code>: The timestamp.
        <li>{string} <code>tid</code>: The id of the tab in which the page was 
            loaded.
        <li>{string} <code>url</code>: The loaded URL.
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'pageload', 
 t: 1360168716158, 
 tid: 2k3,
 url: 'http://www.cnn.com'}
</pre>
</div>

<!-- PAGE FOCUS -->
<a name="pagefocus"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Page Focus</h3>
    Fires any time a page is brought into focus.

    <h4>Fields</h4>
    <ul>
        <li>{string} <code>e: 'pagefocus'</code>
        <li>{int} <code>id</code>: A unique identifier.
        <li>{int} <code>t</code>: The timestamp.
        <li>{string} <code>tid</code>: The id of the tab in which the page was 
            focused.
        <li>{string} <code>url</code>: The URL of the focused page.
        <li>{string} <code>ttl</code>: The page's title.
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'pagefocus', 
 t: 1360168716158, 
 tid: '354235',
 url: 'http://www.cnn.com',
 ttl: 'CNN.com &ndash; Breaking News'}
</pre>
</div>

<!-- PAGE BLUR -->
<a name="pageblur"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Page Blur</h3>
    Fires any time a page looses focus.

    <h4>Fields</h4>
    <ul>
        <li>{string} <code>e: 'pageblur'</code>
        <li>{int} <code>id</code>: A unique identifier.
        <li>{int} <code>t</code>: The timestamp.
        <li>{string} <code>tid</code>: The id of the tab in which the page was 
            focused.
        <li>{string} <code>url</code>: The URL of the focused page.
        <li>{string} <code>ttl</code>: The page's title.
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'pageblur', 
 t: 1360168717445, 
 tid: '354235',
 url: 'http://www.cnn.com',
 ttl: 'CNN.com &ndash; Breaking News'}
</pre>
</div>

<!-- LOGGING STATUS CHANGE -->
<a name="loggingstatuschange"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Logging Status Change</h3>
    Fires any time the user toggles logging or when the browser is started.

    <h4>Fields</h4>
    <ul>
        <li>{string} <code>e: 'loggingstatuschange'</code>
        <li>{int} <code>id</code>: A unique identifier.
        <li>{int} <code>t</code>: The timestamp.
        <li>{boolean} <code>le</code>: Logging enabled: <code>true</code> or 
            <code>false</code>
    </ul>

    <h4>Example</h4>
<pre class='prettyprint'>{id: 1,
 e: 'loggingstatuschange', 
 t: 1360168716158, 
 le: false}
</pre>
</div>


<!-- User History -->
<a name="history"></a>
<h2>Accessing User History</h2>

<p>
The <code>api.user.history</code> should be used to access a user's browser
interactions up until the current moment. For example, if you want to make a
language model of all queries the user has submitted in the past (assuming they
have been logged). There are several functions available to you:
</p>

<ul>
    <li><a href="#user:read-all">Read all past interactions</a>
    <li><a href="#user:read-searches">Read past searches</a>
    <li><a href="#user:read-clicks">Read past clicks</a>
    <li><a href="#user:read-visits">Read past page visits</a>
    <li><a href="#user:read-sessions">Read past sessions</a>
    <li><a href="#user:read-trails">Read past search and browsing trails</a>
</ul>

<!-- Read log entries. -->
<a name="read-all"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Reading all interactions<br/>
        <code>api.user.history.getInteractionHistory</code></h3>
    <h4>Parameters</h4>
    <ul>
        <li>{object} A list of options:<br/>
        <span class="parameter-requirement">Required</span>
        <ul>
            <li>{function} <code>on_chunk</code>
            <ul>
                Invoked per chunk (see <code>chunk_size</code> below). Chunks are 
                processed asynchronously. It should expect three parameters: 
                <ul>
                    <li>{array of objects} the interaction events 
                    <li>{function} a <code>next</code> function, which, when 
                        invoked, will read in the next chunk asynchronously
                    <li>{function} an <code>abort</code> function, which, when 
                        invoked, will
                        end the reading. This can take up to two parameters:
                        <code>isError</code> (default false) and 
                        <code>errorMsg</code>. If <code>isError == true</code>, then
                        the <code>on_error</code> function will be invoked; otherwise,
                        the <code>on_success</code> function will be (see below).
                </ul>
            </ul>
        </ul>
        <span class="parameter-requirement">Optional</span>
        <ul>
            <li>{function} <code>on_success</code>
            <ul>
                Invoked when everything has been read and processed by 
                <code>on_chunk</code>.
            </ul>

            <li>{function} <code>on_error</code>
            <ul>
                Invoked if there's an error.
            </ul>

            <li>{int} <code>chunk_size</code>
            <ul>
                The size of the chunks to process. E.g.,
                <coode>chunk_size = 50</code> will cause 50 entries to
                be read, stored in an array, and then
                passed to the on_chunk function. If provided, this must be
                between 1 and 500 (default: 250). If <code>chunk_size</code> is
                less than 1, it is set to 250, and if greater than 500, it is set
                to 500.
            </ul>

            <li>{boolean} <code>reverse</code>
            <ul>
                If true, the data will be read in reverse
                order of id. Default is 'false'.
            </ul>

            <li>{int} <code>lower_bound</code>
            <ul>
                The smallest id to retrieve; default: 0
            </ul>

            <li>{int} <code>upper_bound</code>
            <ul>
                The largest id to retrieve; default: -1
                (all ids >= lower_bound are retrieved).
            </ul>
        </ul>
    </ul>

    <h4>Description</h4>
    <p>You can use <code>api.user.history.getInteractionHistory</code> to access
a user's full interaction history: clicks, queries, page loads/focuses, and
tab events.
    </p>



<p>
Lets go over a few common patterns. If you want to <b>read the entire activity 
log</b> in default chunk sizes, you can do something like this:
</p>

<pre class="prettyprint">
// Prints each of the given entries.
function print_entries( entries, next, abort ){
    var i;
    for( i = 0; i &lt; entries.length; i++ ){
        console.log( JSON.stringify(entries[i]) );
    }
    // Read the next chunk.
    next();
}

// Read the log and print the entries.
api.user.history.getInteractionHistory({
    on_chunk: print_entries
});
</pre>

<p>
Now let's suppose you are <b>populating a search
history web page</b>. If the user hasn't scrolled to the bottom of the page yet,
there's no reason to load any additional content. So, we want to chunk the data
and traverse it in reverse order (most recent log entries first), but only call
<code>next</code> when we need to. 
</p>

<pre class="prettyprint">
var body = jQuery('&lt;div&gt;');

// Appends a set of entries to an html page.
function appendEntriesToPage( entries, next, abort ){
    var i, entryElm;

    // Append the entries as spans of text.
    for( i = 0; i &lt; entries.length; i++ ){
        entryElm = jQuery('&lt;span&gt;').text(entries[i].e+'@'+entries[i].t);
        body.append(entryElm);
    }

    // When the last element comes into view, read the next chunk. 
    // The 'next' function knows to call this function again, so 
    // we don't have to pass any additional information.
    function onFocus(e){
        jQuery(this).unbind('focus', onFocus);
        next();
    }
    if( entryElm ){ entryElm.focus(onFocus); }
}

// Read the log, reading entries in 50-entry chunks, and add them to the
// body element. We're reading in reverse, so we set reverse: true.
api.user.history.getInteractionHistory({
    on_chunk: appendEntriesToPage,
    chunk_size: 50,
    reverse: true
});
</pre>

<p>
The other options are useful for various situations. The <code>on_success</code>
is convenient if you need to do something once all the data has been read
and processed. Note that this call is only made after the last invocation of
<code>next</code> (when you call <code>next</code> and there is no more data
to read) or if you invoke <code>abort()</code> or <code>abort(false)</code>. 
The <code>on_error</code> function will report any errors that occur internally
or if you invoke <code>abort(true)</code> or 
<code>abort(true, 'error message')</code>.
The <code>upper_bound</code> and <code>lower_bound</code> 
options are handy if you know the minimum or maximum ids of the entries you 
would like to retrieve&mdash;after all, why read in everything if you don't
need to?
</p>

</div>

<!-- Read past searches. -->
<a name="read-searches"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Reading all interactions<br/>
        <code>api.user.history.getPastSearches</code></h3>
    <h4>Parameters</h4>

    <h4>Description</h4>
    <p>

    </p>
</div>

<!-- Read past clicks. -->
<a name="read-clicks"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Reading all interactions<br/>
        <code>api.user.history.getPastClicks</code></h3>
    <h4>Parameters</h4>

    <h4>Description</h4>
    <p>

    </p>
</div>

<!-- Read past page visits. -->
<a name="read-visits"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Reading all interactions<br/>
        <code>api.user.history.getPastPageVisits</code></h3>
    <h4>Parameters</h4>

    <h4>Description</h4>
    <p>

    </p>
</div>


<!-- Read past sessions (queries and sets of visited pages). -->
<a name="read-sessions"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Read past search sets (queries and sets of visited pages)<br/>
        <code>api.user.history.getPastSessions</code></h3>
    <h4>Parameters</h4>

    <h4>Description</h4>
    <p>
        Groups interaction events into sessions, where a session boundary is
        defined as any duration between consecutive interaction events 
        equal to or greater than <code>sessionThreshold</code> minutes.
    </p>
</div>


<!-- Read past search and browsing trails. -->
<a name="read-trails"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Read past search and browsing trails<br/>
        <code>api.user.history.getPastSearchTrails</code></h3>
    <h4>Parameters</h4>

    <h4>Description</h4>
    <p>
        Groups 'related' events. A group starts with one of: 
        <ul>
            <li>a search
            <li>an independent page visit (i.e., a page load not stemming 
                from a click).
            <li>the first event in a new session
        </ul>
    </p>
    <p>
        All actions following one of the start events is grouped with that
        start event until a new start event is reached or the session ends
        (determined by at least <code>sessionThreshold</code> minutes of 
        inactivity).
    </p>
</div>


<a name="realtime"></a>
<h2>Accessing Real-time User Activity</h2>
<ul>
    <li><code><a href="#user:addlisteners">api.user.realTime.addActivityListeners</a></code>
        <li><code><a href="#user:removelisteners">api.user.realTime.removeActivityListeners</a></code>
</ul>

<a name="addlisteners"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Attaching listeners<br/>
        <code>api.user.realTime.addActivityListeners</code></h3>

    <h4>Parameters</h4>
    {object} A map of event listeners:
    <ul>
        <li>{function} <code>query-entered</code>
            <ul>The callback for new search events.</ul>

        <li>{function} <code>link-clicked</code>
            <ul>The callback for new click events.</ul>

        <li>{function} <code>page-loaded</code>
            <ul>The callback for new page load events.</ul>

        <li>{function} <code>page-focused</code>
            <ul>The callback for new page focus events.</ul>
    </ul>
    <p>
    Note that each of the callbacks should expect two arguments:
    <ul>
        <li>{object} A DOM event -- ignore this.
        <li>{object} The interaction event data (see <a href="#user:events">Interaction events</a> above).
    </ul>
    </p>

    <h4>Description</h4>
    <p>
        You can use <code>api.user.realTime.addActivityListeners</code> to 
        access new interactions as they happen.
    </p>
</div>

<a name="removelisteners"></a>
<div class="function">
    <div class="top-top top-tag"><a href="#">top</a></div>
    <div class="top-bottom top-tag"><a href="#">top</a></div>
    <h3>Removing listeners<br/>
        <code>api.user.realTime.removeActivityListeners</code></h3>

    <h4>Parameters</h4>
    {object} A map of event listeners:
    <ul>
        <li>{function} <code>query-entered</code>
            <ul>The callback for new search events.</ul>

        <li>{function} <code>link-clicked</code>
            <ul>The callback for new click events.</ul>

        <li>{function} <code>page-loaded</code>
            <ul>The callback for new page load events.</ul>

        <li>{function} <code>page-focused</code>
            <ul>The callback for new page focus events.</ul>
    </ul>

    <h4>Description</h4>
    <p>
        It's a good idea to remove listeners when a CLRM is unloaded. You can
        use <code>api.user.realTime.removeActivityListeners</code> exactly
        as you used its <code>addActivityListeners</code> counterpart. Invoking
        this function will prevent the given functions from receiving events
        in the future.
    </p>
</div>





</body>
</html>